Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm

نویسندگان

  • Fred G. Gustavson
  • Jack J. Dongarra
چکیده

The TOMS paper ”A Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm” by Andersen, Gunnels, Gustavson, Reid, and Waśniewski, used a level 3 Cholesky kernel subroutine instead of level 2 LAPACK routine POTF2. We discuss the merits of this approach and show that its performance over POTRF is considerably improved on a variety of common platforms when POTRF is solely restricted to calling POTF2.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage

We present a high performance Cholesky factorization algorithm , called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and ...

متن کامل

LAPACK Cholesky Routines in Rectangular Full Packed Format

We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste half the storage space but provide high performance via the use of level 3 BLAS. Packed format arrays fully utilize storage (array space) b...

متن کامل

Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers

We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS. Standard packed format arrays fully utilize storage...

متن کامل

Optimizing Locality of Reference in Cholesky Algorithms1

This paper presents the principle ideas involved in hierarchical blocking, introduces the block packed storage scheme, and gives the implementation details and the performance rates of the hierarchically blocked Cholesky factorization. In some cases the newly developed routines are faster by an order of magnitude than the corresponding Lapack routines. Introduction Most current computers based ...

متن کامل

New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms

We describe new data structures for full storage of general matrices that generalize the current storage layouts of the Fortran and C programming languages. We also describe new data structures for full and packed storage of dense symmetric/triangular arrays that generalize both full and packed storage. Using the new data structures, one is led to several new algorithms that save “half” the sto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008